Delinquency at Peer-to-Peer Lender Prosper.com

by Jason Carter


Introduction

What is Prosper

Prosper, or Prosper Marketplace, is a leader in the online peer-to-peer lending industry. Borrowers create profiles and listings on Prosper.com request personal loans and investors, either individuals or institutions, view the listing (borrower’s loan request) and decide how much to lend the borrower towards the loan.

Interest rates are typically lower for the borrower than going to a financial institution, such as a bank. And multiple investors can contribute to one borrower’s loan request, limiting the overall risk impact of the borrower defaulting on the loan for any one investor.

Why Prosper Loan Data

I’ve personally never used peer-to-peer lending, as a borrower or investor, but I find the idea quite intriguing from a borrower and an investor standpoint. Lower interest rates and the ability to get a loan for “small” items is the obvious pro for borrowers. But for an investor portfolio diversification and higher or less risky investment only holds true if borrower’s don’t default on their loans. So what are the deliquincy rates? Is there any main contributing factor? Is there any coorelation between where someone lives, their income, the purpose of their loan and defaulting on their loan? Deliquincies and their coorelation at Prosper.com is what we’ll be exploring.

Delinquent is the failure to accomplish what is required by law or duty, such as the failure to make a required payment or to perform a certain action.

The term delinquent commonly refers to a situation where a borrower is late or overdue on a payment, such as income taxes, a mortgage, automobile loan or credit card account.

Investopedia

Data Overview

The Prosper loan dataset was provided by Udacity as part of their Data Set Options

From Udacity

This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.

Dataset was last updated 03/11/2014

Process

Overall plan on data exploration - say something here

  1. load libraries
  2. load data
  3. summary look
  4. univariate plot+analysis
  5. bivariate plot+analysis
  6. multivariate plot+analysis
  7. final plots
  8. reflection

With 81 variables, it would be useful to indicate which ones you’ll send time looking at. Performed an initial review of all variables definitions and determined a list of 15-ish variables which will be required either directly or indirectly for the exploration.

Main Features of Interest

After reviewing the long list of variables, and thinking of all of the different paths of investigation, I’ve decided to narrow my focus of investigation around deliquencies and their correlations. To this end, the main features of interest are:

  • Term: The length of the loan expressed in months.
  • LoanStatus: The current status of the loan.
    • Cancelled
    • Chargedoff
    • Completed
    • Current
    • Defaulted
    • FinalPaymentInProgress
    • PastDue (the PastDue status will be accompanied by a delinquency bucket)
  • BorrowerState: The two letter abbreviation of the state of the address of the borrower at the time the Listing was created.
  • ListingCategory: The category of the listing that the borrower selected when posting their listing.
    • 0 - Not Available
    • 1 - Debt Consolidation
    • 2 - Home Improvement
    • 3 - Business
    • 4 - Personal Loan
    • 5 - Student Use
    • 6 - Auto
    • 7- Other
    • 8 - Baby&Adoption
    • 9 - Boat
    • 10 - Cosmetic Procedure
    • 11 - Engagement Ring
    • 12 - Green Loans
    • 13 - Household Expenses
    • 14 - Large Purchases
    • 15 - Medical/Dental
    • 16 - Motorcycle
    • 17 - RV
    • 18 - Taxes
    • 19 - Vacation
    • 20 - Wedding Loans
  • CreditScoreRangeLower: The lower value representing the range of the borrower’s credit score as provided by a consumer credit rating agency.
  • CreditScoreRangeUpper: The upper value representing the range of the borrower’s credit score as provided by a consumer credit rating agency.
  • BankcardUtilization: The percentage of available revolving credit that is utilized at the time the credit profile was pulled.
  • IncomeRange: The income range of the borrower at the time the listing was created.
  • TotalProsperLoans: Number of Prosper loans the borrower at the time they created this listing. This value will be null if the borrower had no prior loans.
  • LoanOriginalAmount: The origination amount of the loan.
  • Investors: The number of investors that funded the loan.

Supporting Features

Supporting features…

  • ListingCreationDate: The date the listing was created.
  • Occupation: The Occupation selected by the Borrower at the time they created the listing.
  • IsBorrowerHomeowner: A Borrower will be classified as a homowner if they have a mortgage on their credit profile or provide documentation confirming they are a homeowner.
  • BorrowerAPR: The Borrower’s Annual Percentage Rate (APR) for the loan.
  • BorrowerRate: The Borrower’s interest rate for this loan.
  • Recommendations: Number of recommendations the borrower had at the time the listing was created.
  • DebtToIncomeRatio:
  • StatedMonthlyIncome:

Exploratory Analysis

Univariate analysis

First, I want to review some basics values and descriptive statistics of the dataset. This is for two reasons, 1) to verify some values put forward by Udacity, such as the 81 variables and 113,937 observations. And 2) to determine if there is anything “weird” or stands out about the data which I may want to dive deeper into. Although I have a general direction, i.e deliquencies and their correlations, if something as or more interesting pops up from the descriptive stats, I’ll take a peek in that direction as well.

Lets take a look at the dataset and verify the dimensions

## [1] 113937     81

Now, I setup the new data frame by keeping only the main and support variables.

Column “ListingCategory” is actually labelled “ListingCategory..numeric.” as its column value, numeric, is mapped to a string one word description.

## [1] "ListingCategory..numeric."
## 'data.frame':    113937 obs. of  20 variables:
##  $ Term                     : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus               : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ BorrowerState            : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ ListingCategory..numeric.: int  0 2 0 16 2 1 1 2 7 7 ...
##  $ CreditScoreRangeLower    : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper    : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ BankcardUtilization      : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ IncomeRange              : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ TotalProsperLoans        : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ LoanOriginalAmount       : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ Investors                : int  258 1 41 158 20 1 1 1 1 1 ...
##  $ ListingCreationDate      : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ Occupation               : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ IsBorrowerHomeowner      : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ BorrowerAPR              : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate             : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ Recommendations          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ DebtToIncomeRatio        : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ StatedMonthlyIncome      : num  3083 6125 2083 2875 9583 ...
##  $ LoanCreationYear         : chr  "2007" "2014" "2007" "2012" ...

There are a few categorical features in the new dataset. At first glance, BorrowerState has 52 states or levels and Occupation has 68 levels/categories for over 113k observations - it should be possible to group by occupations to see any trends.

Example of categorical values in “LoanStatus”

##  [1] "Cancelled"              "Chargedoff"            
##  [3] "Completed"              "Current"               
##  [5] "Defaulted"              "FinalPaymentInProgress"
##  [7] "Past Due (>120 days)"   "Past Due (1-15 days)"  
##  [9] "Past Due (16-30 days)"  "Past Due (31-60 days)" 
## [11] "Past Due (61-90 days)"  "Past Due (91-120 days)"

There are a few challenges here. If I’m investigating delinquencies, I need to define a definition for it, and since the Prosper data has a few categories that could cover this, I need to do two things 1) learn the difference between Chargedoff and Defaulted and 2) assign a cutoff line for delinquents, i.e if someone is 1-15 days late on payment is that delinquent? What about 61-90 days late?

Chargedoff:

A charge-off or chargeoff is the declaration by a creditor (usually a credit card account) that an amount of debt is unlikely to be collected. This occurs when a consumer becomes severely delinquent on a debt. Traditionally, creditors will make this declaration at the point of six months without payment. In the United States, Federal regulations require creditors to charge-off installment loans after 120 days of delinquency, while revolving credit accounts must be charged-off after 180 days

Defaulted:

In finance, default is failure to meet the legal obligations (or conditions) of a loan,[1] for example when a home buyer fails to make a mortgage payment, or when a corporation or government fails to pay a bond which has reached maturity.

# New variable to be used to identify "delinquent" borrowers
prosperloans$DelinquentBorrowers <- ifelse(prosperloans$LoanStatus == "Defaulted" |
                                            prosperloans$LoanStatus == "Chargedoff" |
                                            prosperloans$LoanStatus == "Past Due (61-90 days)" |
                                            prosperloans$LoanStatus == "Past Due (91-120 days)" |
                                            prosperloans$LoanStatus == "Past Due (>120 days)", 1, 0)

Before going too far, I was curious as to the creation date distribution of observations in the dataset. This is important as this could cause bias in further investigation, for example, if the majority of the data was observed during 2008-2009 (the financial crisis) this could skew the data towards having a majority of delinquency. To analysis this I’m going to explore the ListingCreationDate feature along with LoanStatus.

Inorder to help faciliate this more easily, I’ve decided to create a new variable which will capture the year segment of the loan creation date.

Discuss - drop in 2009 and a steady pick up following subsequent years. Financial crisis??? No investors??? Competitors entered??? Need to review further.

Looking at only delinqunet borrowers, a possible lead up to the financial crisis - using the same scale, no issues 2004/5, with increase in volume of defaults and chargedoff leading up to 2007/8 almost half of all borrowers

Remember - this is loan “creation” date, this is a good lesson in understanding the data and miss communication in visuals. This is not representing delinquent loans of 2007/8 but loans which went delinquent at some point in time that were created in 2007/8. I believe it can be assumed this is why you see a lead up to the financial crisis, i.e. borrowers took out loans at a “normal” rate from 2005-2008 but potentially after the financial crisis in 2008, borrowers can no longer pay and loans created 2006-2008 were the most succeptable to defaults.

This assumption also explains the dip in 2009 and gradual increase, less borrowers (1st graph) and much fewer delinquents in comparison to the level of loans in good standing.

Time to further the analysis and some other interesting initial features to look at in regards to delinquency are: Term, LoanStatus, LoanOriginalAmount, Income Range, Borrower’s State and Listing Category (the reason for a loan).

Term and Loan Status

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.00   36.00   36.00   40.83   36.00   60.00

From the summary breakdown, the average term appears to be 36 months (3 yrs) with an original loan request being $8337, ranging from $1000 to 35000. This seems to cover the full available range allowed by Prosper, although at the time of writting this analysis, Prosper has a minimum limit of $2000. And what seems like a surprising amount, on average these loans have 80 investors.

It’s also very clear that the majority of borrowers choose a 36 month term, where 60 and then 12 month terms are 2nd and 3rd most popular, respectively. And approximately 15% of the total borrowers are delinquent on their loans. And the distribution of delinquency is not surprisingly very similar to that of the total loan distribution across loan terms.

Original Loan Amount

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

Summary data indicates mean and median quite far apart. This is visualized in the graph which is right skewed graph. As it the spikes of borrowers at $10,000, $15,000 and $25,000 are pulling the mean higher. Although, it’s not particularly surprising that borrowers gravitate to these “nice” numbers.

Zooming in to look at the details, specifically under $10,000, you can see multiple spikes at these “nice” numbers - 1000, 2000,…,8000, 9000, 10000 - dollars. Since most debt doesn’t come in nice round numbers, we can assume borrowers are probably rounding up instead to requesting exact dollar values for their loans. For example, if a borrower is $4678.86 dollars in debt, she is probably requesting a loan of $5000.00. From the data provided, I don’t believe there is in specific way to determine if this assumption is true, however, given that debt / money is a floating value with interest rate applied as floating value, it is quite unlikely that the majority of borrowers have debt divisible by 5.

Income Range

# Plot out income range of delinquent borrowers
# Plot out income range of all borrowers
positions <- c("Not employed", "Not displayed", "$0", 
                "$1-24,999", "$25,000-49,999", "$50,000-74,999",
                "$75,000-99,999", "$100,000+")

ggplot(prosperloans, aes(IncomeRange)) + 
  scale_x_discrete(limits = positions) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  geom_bar()

discuss - majority of borrowers have an income rage between $25k and $75k. Not employed and $0 seem to be negliable in comparison

Borrowers State

ggplot(prosperloans,aes(BorrowerState)) +
  geom_bar() +
  scale_y_log10() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Borrowers state covers 52 states, this includes DC and Puerto Rico. Representation for California is skewing the graph, unable to distinguish smaller state count and differences among states. Given California is the most populous state in the US, having a much higher count in borrowers is probably not a good indicator of a trend. To fix this and to better view any trend a log10 transformation was applied to the y axis.

# Map the 2 letter abbrievations to their full name for map plot
prosperloans$BorrowerStateFullName <- tolower(state.name[match(prosperloans$BorrowerState, state.abb)])

us <- map_data("state")

# Group by stateName and delinquent borrowers
grp_by_state <- prosperloans %>%
  group_by(BorrowerStateFullName, BorrowerState, DelinquentBorrowers) %>%
  summarise(count = n()) %>%
  filter(DelinquentBorrowers == 1)

# Plot US map with delinquent borrower data
ggplot() + 
  geom_map(data = us, map = us,
            aes(x = long, y = lat, map_id = region, label = region),
            fill="#ffffff", color="#ffffff", size=0.15) +
  geom_map(data = grp_by_state, map=us,
            aes(fill = count, map_id = BorrowerStateFullName),
            color = "#ffffff", size = 0.15) +
  scale_fill_continuous(low='lightgray', high='black', guide='colorbar') +
  labs(x="Delinquent Borrowers", y=NULL) +
  coord_map("albers", lat0 = 39, lat1 = 45) +
  theme(panel.border = element_blank(),
        panel.background = element_blank(),
        axis.ticks = element_blank(),
        axis.text = element_blank())

Discuss - count of delinquent borrowers across the US. CA has the highest levels but it should be taken into consideration that CA is the more populous. Middle of the America has the lowest rate of delinquency, East, West and Texas.

Listing Category

# Map numbers to category
listing_category <- c("Not Available", "Debt Consolidation", "Home Improvement", "Business",
                      "Personal Loan", "Student Use", "Auto", "Other", "Baby&Adoption",
                      "Boat", "Cosmetic Procedure", "Engagement Ring", "Green Loans",
                      "Household Expenses", "Large Purchases", "Medical/Dental", "Motorcycle",
                      "RV", "Taxes", "Vacation", "Wedding Loans")

# Create new variable for mapped names (remember vector is not 0 based indexed, starts at 1)
prosperloans$ListingCategoryFullName <- listing_category[(prosperloans$ListingCategory..numeric.)+1]

g <- ggplot(prosperloans, aes(prosperloans$ListingCategoryFullName, fill = factor(DelinquentBorrowers))) +
  geom_bar(position="dodge") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
g

# Log file will help with comparison across all categories
g + scale_y_log10()

Discuss

Unidentifable: Debt consolidation (could be a mix of anything), Not Available, Other, Business (maybe)…The next “real” answer is Auto and Home Improvement

The listing category values represent the purpose or reason why the borrower is requesting a loan. For example, if the value is “Auto”, perhaps the borrower recently purchased an expensive car and needs $10,000 as a downpayment to lower her monthly bill.

In my observation with this plot is that, although “Debt Consolidation” is by far the most common reason borrowers require a loan, this is misleading. Debt consolidation removes the specificity of where the debt came from, i.e. Auto, Student Use, Taxes could have equally contributed to someone’s debt but the “reason” the borrower is requesting a loan is for consolidation and therefore they flag “Debt Consolidation” as the purpose. One thing we can take away from the high volume of borrowers requesting debt consolidation loans, is that, many borrowers have debt from multiple sources.

Bivariate analysis

Continuing to look for trends in delinquency, I will investigate possible expected and unexpected relationships between features with scatterplot and by measuring thier coorelation (pearson / spearman / rho).

  • listingCategory and loanOriginalAmount (need to map listing to values)
  • bankcardUtilization and loanStatus
  • borrowerRate and loanStatus
  • loanStatus and investors (are you more likely to default if you have less investors? investors think you’re a good bet to payoff your loan so more people invest in you. This would imply that having fewer investors mean people tend to believe you are a bad investment - more likely to default. Is this true?)

  • Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?
  • Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?
  • What was the strongest relationship you found?

Listing Cateogry and Loan Amount

loan amount for different categories. How to compare this with delinquent loans???

ggplot(prosperloans, aes(ListingCategoryFullName, LoanOriginalAmount, group = ListingCategoryFullName)) +
  geom_boxplot() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Discuss - mean, debt consolidation outliers, max loan amount, baby/adoption and debt consolidation are surprising similar plots

Loan Term and Loan Amount

# look into outliers and discrete x axis on correct terms
ggplot(prosperloans, aes(Term, LoanOriginalAmount, group = Term)) +
  geom_boxplot() +
  scale_x_continuous(breaks = c(0,12,36,60)) +
  theme_minimal()

Interest Rate and Listing Category

Is there a relationship between what rate someone gets and their reason for a loan

ggplot(prosperloans, aes(ListingCategoryFullName, BorrowerRate, group = ListingCategoryFullName)) +
  geom_boxplot() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Discuss - person loans has the lower mean APR, debt consolidation, boat and baby loans have similar mean APRs

remember to bring it back to delinquency

loan and bankcard utilization

# laon, bankcard and loan status (factor)
ggplot(prosperloans, aes(LoanOriginalAmount, BankcardUtilization)) +
  geom_point(alpha = 1/50) +
  theme_minimal() +
  scale_y_log10()
## Warning: Removed 7604 rows containing missing values (geom_point).

Discuss - over 7k borrowers with NA utilization and values over 1 (100% utilization) which I’m not sure how that was calculated. Even for lower loan amounts there is a high card utilization

Borrower Rate and APR

# useful in any way???
ggplot(prosperloans, aes(BorrowerRate, BorrowerAPR)) +
  geom_point(alpha = 1/50) +
  theme_minimal()
## Warning: Removed 25 rows containing missing values (geom_point).

Discuss - APR is annual cost of the loan to the Borrower, Rate is the interest rate to the borrower excluding fees.

Debt-to-Income Ratio

# monthly income vs debt to income ratio
# relatioship between more monthly income and debt:income
ggplot(prosperloans, aes(StatedMonthlyIncome, DebtToIncomeRatio, colour = factor(DelinquentBorrowers))) +
  geom_point(alpha = 0.1) +
  theme_minimal() +
  scale_x_continuous(limits = c(0,20000)) +
  scale_y_continuous(limits = c(0, 1)) +
  geom_density2d()
## Warning: Removed 10388 rows containing non-finite values (stat_density2d).
## Warning: Removed 10388 rows containing missing values (geom_point).

Discuss - at 20k a month, outliers with zero income and over 20k (approx 10k observations). Most borrower’s debt-to-income ratio is under 0.25 (or 0.5).

Look up “good”, debt-to-income ratios

# names(prosperloans)
ggplot(prosperloans, aes(BorrowerRate, DebtToIncomeRatio)) +
  geom_point(alpha = 0.01) +
  theme_minimal() +
  scale_y_log10()
## Warning: Removed 8554 rows containing missing values (geom_point).

Discuss -

Multivariate Analysis

  • Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?
  • Were there any interesting or surprising interactions between features?

Correlation Matrix

# Pearson correlation coefficients, using pairwise observations (default method)
ggcorr(prosperloans, label = TRUE, label_size = 3, hjust = 0.9, size = 3, color = "black")
## Warning in ggcorr(prosperloans, label = TRUE, label_size = 3, hjust =
## 0.9, : data in column(s) 'LoanStatus', 'BorrowerState', 'IncomeRange',
## 'ListingCreationDate', 'Occupation', 'IsBorrowerHomeowner',
## 'LoanCreationYear', 'BorrowerStateFullName', 'ListingCategoryFullName' are
## not numeric and were ignored

Discuss - Correlation matrix

Credit Score and Borrowing Rate

g1 <- ggplot(prosperloans, aes(CreditScoreRangeLower, BorrowerRate)) +
  geom_point(alpha = 0.1) +
  theme_minimal()

g2 <- ggplot(prosperloans, aes(CreditScoreRangeUpper, BorrowerRate)) +
  geom_point(alpha = 0.1) +
  theme_minimal()

grid.arrange(g1, g2, nrow = 1)
## Warning: Removed 591 rows containing missing values (geom_point).

## Warning: Removed 591 rows containing missing values (geom_point).

Discuss - shows expected correlation between a borrower’s lower credit and a lower borrowing rate.

Credit Score and Loan Amount

g1 <- ggplot(prosperloans, aes(CreditScoreRangeLower, LoanOriginalAmount)) +
  geom_point(alpha = 0.1) +
  theme_minimal()

g2 <- ggplot(prosperloans, aes(CreditScoreRangeUpper, LoanOriginalAmount)) +
  geom_point(alpha = 0.1) +
  theme_minimal()

grid.arrange(g1, g2, nrow = 1)
## Warning: Removed 591 rows containing missing values (geom_point).

## Warning: Removed 591 rows containing missing values (geom_point).

Discuss - expected higher credit score, the higher the loan amount

Loan Amount and Investors

# scale_x limits
ggplot(prosperloans, aes(LoanOriginalAmount, Investors)) +
  geom_point(alpha = 0.1) +
  theme_minimal()

Discuss -

Bankcard Utilization and Borrowing Rate

ggplot(prosperloans, aes(BankcardUtilization, BorrowerRate)) +
  geom_point(alpha = 0.1, position = "jitter") +
  scale_x_continuous(limits = c(0, 1)) +
  theme_minimal()
## Warning: Removed 13153 rows containing missing values (geom_point).

Discuss -

prosperloans <- transform(prosperloans, 
               IncomeRange = factor(IncomeRange, levels = positions, labels = positions))

ggplot(prosperloans, aes(LoanCreationYear, LoanOriginalAmount)) + 
  geom_boxplot() +
  scale_y_continuous(limits = c(0,20000)) +
  facet_grid(IncomeRange ~ .) +
  theme_minimal()
## Warning: Removed 5285 rows containing non-finite values (stat_boxplot).

Discuss - all income ranges display the same curve (distribution across 2007 - 2014). Missing “not displayed” after 2008, possibly this category was removed as an option for users after 2008. Borrowers with $100,000+ consistently have a wider spread of loan amounts over the years.

ggplot(prosperloans, aes(LoanCreationYear, LoanOriginalAmount, colour = factor(DelinquentBorrowers))) + 
  geom_boxplot() +
  scale_y_continuous(limits = c(0,20000)) +
  facet_grid(IncomeRange ~ .) +
  scale_color_discrete(name = "Borrowers", breaks = c(0,1), labels = c("Good Standing", "Delinquent")) +
  theme_minimal()
## Warning: Removed 5285 rows containing non-finite values (stat_boxplot).

Final Plots

plot 1

bivariate plot

plot 2

US map

  • legend
  • State.abb
  • rename “count”

plot 3

correlation if any good

Reflection

stuff

Programming Environment

sessionInfo()
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.11.3 (El Capitan)
## 
## locale:
## [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] GGally_1.0.1    gridExtra_2.2.1 mapproj_1.2-4   maps_3.1.0     
## [5] dplyr_0.4.3     ggplot2_2.1.0  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.4      knitr_1.12.3     magrittr_1.5     MASS_7.3-45     
##  [5] munsell_0.4.3    colorspace_1.2-6 R6_2.1.2         stringr_1.0.0   
##  [9] plyr_1.8.3       tools_3.3.0      parallel_3.3.0   grid_3.3.0      
## [13] gtable_0.2.0     DBI_0.4-1        htmltools_0.3.5  lazyeval_0.1.10 
## [17] digest_0.6.9     assertthat_0.1   reshape2_1.4.1   formatR_1.3     
## [21] evaluate_0.9     rmarkdown_0.9.6  labeling_0.3     stringi_1.0-1   
## [25] scales_0.4.0     reshape_0.8.5